Parallel Text Processing Alignment and Use of Translation Corpora

نویسندگان

Jean Véronis

Maria Zimina

چکیده

In the past ten to fifteen years considerable progress has been made in the field of parallel text alignment. The term parallel text itself is now well-established within the computational linguistics community. It refers to texts accompanied by their translations in one or several languages. Aligned texts have proved to be an invaluable source of translation data for terminology banks and bilingual dictionaries. Translation alignment is currently providing the basis for the development of a new generation of tools to assist human translators and to improve the quality and productivity of their work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...

متن کامل

JMaxAlign: A Maximum Entropy Parallel Sentence Alignment Tool

Parallel corpora are an extremely useful tool in many natural language processing tasks, particularly statistical machine translation. Parallel corpora for certain language pairs, such as Spanish or French, are widely available, but for many language pairs, such as Bengali and Chinese, it is impossible to find parallel corpora. Several tools have been developed to automatically extract parallel...

متن کامل

Using Parallel Corpora to Create a Greek-English Dictionary with Uplug

This paper presents the construction of a Greek-English bilingual dictionary from parallel corpora that were created manually by collecting documents retrieved from the Internet. The parallel corpora processing was performed by the Uplug word alignment system without the use of language specific information. A sample was extracted from the population of suggested translations and was included i...

متن کامل

Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora

The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill. In the present work, a technique has been developed that extracts parallel corpus between Manipuri, a morphologicall...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Parallel Text Processing Alignment and Use of Translation Corpora

نویسندگان

چکیده

منابع مشابه

استخراج پیکره‌ موازی از اسناد قابل‌مقایسه برای بهبود کیفیت ترجمه در سیستم‌های ترجمه ماشینی

JMaxAlign: A Maximum Entropy Parallel Sentence Alignment Tool

Using Parallel Corpora to Create a Greek-English Dictionary with Uplug

Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora

A new model for persian multi-part words edition based on statistical machine translation

عنوان ژورنال:

اشتراک گذاری